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(54) Video image compression using weighted wavelet hierarchical vector quantization 

(57) A weighted wavelet hierarchical vector quanti- 
zation (WWH VQ) procedure is iniliaied by obtaining (12) 
an N x N pixel image where 6 bits per pixel A look-up 
operation ^14) is performed to obtain data representing 
a discrete wavelel transform (DWT) lotlowed by a quan- 
tization of the data. Upon completion of the look-up, a 
data compression will have been performed. Further 
stages and look-up will resuft in further compression of 
the data, i.e., 4.-1 , 6: 1 , 1 6:1 . 32: i . 64- 1. ..etc. Accordingly, 
a determination (16) is made whether the compression 
is complete. If the compression is incomplete, further 
look-up is performed. If the compression is complete, 
however, the compressed dala is transmitted (18). Op- 
tionally, it is determined ( 1 9) al a gateway whether further 
compression is required. I' so. transcoding is performed 
(20). The receiver receives (22) the compressed data. 
Subsequently, a second look-up operation (24) is per- 
formed to obtain data representing an inverse discrete 
wavelet transform ot me decompressed data. After one 
ii era ion, the data is oecompressed by a factor of two. 
Further iterations allows for further decompression of the 
data. Accordingly, a determination (26) is made whether 
decompression is complete. If the decompression is in 
incomplete, further look-ups are performed, elsethe pro- 
cedure is ended. 
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Description 

This invention relates to a method and apparatus lor compressing a viooc image tor transmission to a receiver 
and/of decompressing the image at the receiver. More particularly, the invention is directed tc an apparatus and method 
i lor periorminc oata compression on a video image using weighted wavelet hierarchical vector Quantization (WWHVO) 
WWHVO advantageously utilizes certain aspects ol hierarchical vector quantization (HVO) and discrete wavelet trans 
lorm (DWT), a subband translorm. 

A vector quantizer ( VO) is a Quantizer thai maps k-dimensional input vectors into one ol a finite set ol k^oimensional 
reproduction vectors, or codewords. An anetog-to-digital converter, or scalar quantizer, is a special case in whch the 
i£> quantizer maps each real number to one ol a finite set ol output levels. Since the logarithm (base 2) of the number ol 
coaewords is the numoer of bits neeaed to specrty me codeword, the logarithm ol the number of codewords, divided by 
the vector dimension, is the rate ol the Quantizer in bits per symbol. 

A VO can be divided into two oarts: an encoder and a decoder. The encoder maps the input vector into a binary 
cooe representing the inoex ol the selected reproduction vector, and the decoder maps the binary code into the selected 
»i reproduction vector. 

A major advantage ol ordinary VO over other types ol quantizers (e.g.. translorm cooers) is that the decoding can 
be done by a simple table lookup. A major disadvantage ol ordinary VO with respect to other types ol quantizers is that 
ihe encoding is computationally very complex. An optimal encooer performs a lull search through the enlire set ol re- 
production vectors looking lor tne reproduction vector that is closest (wilh respect to e given distortion measure) lo each 
input vector. 

For example, il the distortion measure is squared error, then the encoder computes the quantity fix • yjj* tor each 
input vector X and reproduction vector y. This results in essentially M multiply/add operations per ripul symbol, where 
M is the number ol codewords. A number ol suboplimal. but computationally simpler, vector quantizer encoders have 
been studied rn the literature. For a survey, see the book by Gersho and Gray. Vector Quantization and Signal Com- 
*s pression. Kluwer, 1992. 

Hierarchical vector quantization (HVO) is VO that can encode using essentially one table lookup per input symbol 
(Decoding is also done by table lookup). To the knowledge of the inventors. HVO has heretofore not appeared in the 
Irterature outside of Chapter 3 of the Ph.D. thesis ol P. Chang, .Predictive. Hierarchical, and Translorm Vector Ouan- 
nzatwn lor Speech Coding. Stanford University. May 1986. where ii was used for speech. Other methods named "hi- 
erarchrcal vector quantization* have appeared in the literature, but they are unrelated lo the HVO that is considered 
respecting the present invention. 

The basic idea behind HVO is the loitowrng. The input symbols are linely quantized to p bils ol precision. For image 
data, p = e is typical. In principle il is possible to encode a k-dimensional vector using a single lookup into a table wilh 
a kp-bit address, but such a table would have 2*p entries, which is clearly irteasible if kand p are even moderately large. 
HVO performs the table lookups hierarchically. For example, to encode a k = 6 dimensional vector (whose components 
are each finely quantized to p = 6 bits of precision) lo 8 bits representing one of M = 256 possible reproductions, the 
hierarchical structure shown in FIGURE la can be used, in which Tables 1. 2, and 3 each have 16-brl inputs and e-brt 
outputs (i.e., they are each 64 KByte tables). 

A signal flow diagram lor such an encoder is shown in FIGURE 1b. In the HVOol FIGURE 1b, the tables T at each 
stage ol the encoder along with the delays 2 are illustrated. Each level in the hierarchy doubles the vector dimension 
ol Ihe quantizer, and therelore reduces the bil rate by a lactor ol 2. By simitar reasoning, the ith level in Ihe hierarchy 
perlorms one lookup per 2' samples, and iherelore ihe total number of lookups per sample is at mosl 1/2 * W4 ♦ 1/8 
4 ... r 1 , regardless ol the numoer ol levels. Ol course, it is possible to vary these calculations by eajusting the dimensions 
of the various tables. 

The contents of the HVO tables can be determined in a variety of ways. A straightforward way is the following. With 
reference to FIGURE la, Table 1 is simply a table-lookup version ol an optimal 2-dimensional VO. That is. an optimal 
2-dimensional lull search VO with M = 256 codewords is designed by standard means (e.g.. the generalized Lloyd 
algorithm discussed by Gersho and Gray), and Table 1 is filled so that rt assigns to each ol its 2 16 possible 2-dimensional 
input vectors the 8-bd index of the nearest codeword. 

Table 2 is just slightly more complicated. First, an optimal ^ -dimensional full search VO with M = 256 codewords is 
designed by standard means. Then Table 2 is filled so that it assigns to each ol its 2 16 possible 4-dimensional input 
vectors (i.e.. Ihe cross product ol all possible 2-dimensional output vectors from the first stage) the 8-bit index of its 
nearest codeword. The tables lor stages 3 and up are designed similarly. Note that the distortion measure is completely 
arbitrary. 

A discrete waveiei transformation (OWT). or more generally, a tree-structured subband oecomposiiion. is a method 
lor hierarchical signal transiormalion. Little or no iniormation is tosi in such a translormation. Each stage ol a DWT 
involves filtering a signal into a taw-pass component and a high-pass component, each of which is critically sampled 
(te.. down sampled by a lactor ot two). A more general tree-structured subband o^omposition may filter a signal into 
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more than iwc banos per sage. anc may oi may no; be crhcaiiy sampled, here we constoer ooty me OWT, bur inose 
skilled in the an can easily extend the relevant notions to the more general case 

With reterence to FIGURE 2a. let X = (x(0), x(l),. ..x(N - 1)) be a 1 -dimensional input signal with finite length N. As 
shown by the tree structure A, the lirst stage ol a DW7 ceccmposes the input signal \ 6 - X into the low-pass and 

high-pass signals X t1 = (X^O). x L ,(i)....x u (N/2-i)) and X*, = (x M1 (0). X^O) each cf length N/2. The 

second siege decomposes only the tow-pass signal \, trom me lirst siage into tne tow-pass and high-pass signals X^ 

= (Xtf(O). x uW x^C^-i)) and Kn = (Xh?(0). x„ ? 0) ^^(Nw.i)). each of length N/4. Similarly, the third stage 

decomposes only the low-pass signal Xlj Irom the second stage into low-pass and high-pess signals \ 3 ano X„, el 
lengths N/8, and so on. M is also possible lor successrve stages lo oecompose some ol the high-pass signals in addition 
to the low-pass signals. The set oi signals at the leaves ol the resulting complete or partial tree is precisely the transform 
ot the input signal at the rooi. Thus a DW7 can be regarded as a hierarchically nested set ol transforms. 

To specify the transform precisely, it is necessary lo specify the tillers used at each stage. We consider only tiniie 
impulse response (FIR) fitters, i.e., wavelets with finite suppon. L is the length of the filters (i.e.. number ot taps), and 
the low-pass filter (the scaling function) ano the high-pass filter (the drflerence lunction. or wavelet) are designated by 
Iheir impulse responses, 1(0), 1(1) I(L - 1). and h(0). h(1)....h(L-1). respectively. Then at the output of the mth stage, 

= '^L.m.ifSO « { (1)x LfT ^(2i « 1) - ... ♦ KL-l)x Lm . 1 (2i ♦ L-1) 

hCOJx^faj ♦ h(i)x MjlM (2i + 1) + ... ♦ h(L-i)x M ^,(2i ♦ L-1) 

lc » = 0. 1 N/2 ffl . Boundary effects are handled in some expedient way. such as setting signals to zero outside their 

windows ol definition. The titters may be the same Irom node to node, or they may be different. 

The inverse transform is performed by diflerent bwpass and high-pass fillers, called reconstruction fillers, applied 

in reverse oroer. Lei l'(0). I'(1) IVL-i) and h'(0). h'(i)....h*(L-1) be the impulse responses of the inverse fitters. Then 

\ can be reconstructed trom \ m and X^ as: 

* L .m- 1 ™= IWu.ffl ♦ *V)\ .Ji * i) * h-(0)x H ^(i) + h'(2)x H/n (i * 1) 

W 1 (« ♦ D« l'0)x L .Ji ♦ D ♦ l'(3)x Ljn (i * 2) ♦ h'(1)x Mjn (i + 1)4 h'(3)x Mni (i ♦ 2) 

lor i-0,1 N/2™. That is, Ihe low-pass and high-pass banos are up sampled (interpolated) by a factor of two filtered by 

their respective reconsl rue iron fitters, and added. 

Two-dimensional signals are handled similarly, but with two-dimensional litters. Indeed, if the filters are separable, 
then the littering can be accomplished by lirst filtering in one dimension (say horizontally along rows), then filtering in 
the other dimension (vertically atong columns). This results in the hierarchical decompositions illustrated in FIGURES 
2B. showing tree structure B. and 2C, in which ihe odd stages operate on rows, while the even stages operate on 
columns. If the input signal \ 0 is an N x N image then X L1 and X m are N x (N/2) images, X Ll2 , X^. X^. and X MM2 
are (N/2) x (N/2) images, and so forth. 

Moreover, notwithstanding that which is known about HVO and DWT. a wide variety ot video image compression 
methods ano apparatuses have been impiemenied. One existing method that addresses transcoding problems is the 
algorithm of J. Shapiro, 'Embedded Image Coding using Zercwees of Wavelet Coefficients.* IEEE, Transactions on 
Signal Processing, December 1993, in which transcoding can be done simply by stripping ofl prelixes of codes in ihe 
bil stream. However, this algorilhm irades simple tianscoding tor computationally complex encoding and decoding 

Olher Known methods lack certain praclical and convenient leaiures. For example, these other known video com- 
pression methods do not allow a user lo access ihe transmuted image at different quality levels or resolutions during 
an interactive mutlicast over mufiiple rate channels in a simplified system wherein encoding and decoding are accom- 
plished solely by the performance of table lookups. 

Moie panicularfy, using these other non-embedded encoding video compression algorithms, when a multicast (or 
simulcast, as applied in the lelevision inousiry) ol a video stream is accomplished over a network, either every receiver 
of the video stream is restricted lo a certain quality (and hence bandwidth) level at the sender or bandwidth (and CPU 
cycles or compression hardware)is unnecessarily used by multicasting a number of streams at different bit raies. 

In video conferencing (multicast) over a heterogeneous network comprising, tor example. ATM, the Internet. ISDN 
and wireless, some lorm of transcoding is typically accomplished al the 'gateway' between sender end receiver when 
a basic rale mismatch exists between them. One solution to ihe problem is for the 'gatewayVreceivei lo decompress 
Ihe video sueam ano recompress and scale rt according lo internal capabilities. This solution, however, is nol only 
expensive but also increases laiency by a considerable amount. The transcoding is preferably done m an online lashion 
(with minimal laiency/buffering) due lo the interactive nature of the application and to reduce hardware/software costs. 

From a user's perspective, the problem is as follows: (a) Sendcr(i) wants to send a video stream ai K bits/sec to M 
receivers; (b) Recerver(j) wants Ic receive Sender(i)'s vioeo stream at L bits/sec (L<K); but (c) Ihe image dimensions 
that Receiver(j) desires or is capable ol processing are smaller man ihe default dimensions that Sender(i) encoded. 
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11 is desirable thai any system ancvc* methoo 10 adoress these proolems in interactive vioeo acvantageously incor- 
porate (l) inexpensive transcoding trom £ nigner tc a lower bit rate, preferably by only ooerstinc on a compressed 
stream, (2) simple bit rate control, (3) simple scalability of dimension at the cesiination. (4) symmetry (resulting in very 
inexpensive decode ano encode), and (5) a piiori;i2ec compressed stream in audition 10 acceptable rate-cistoncn per- 
lormance. None ol the current stanoarcs (Motion JPEG. MPEG and H.261) possess eU ol these characteristics. In 
panic ular. no current stanaard has a lacilny to transcode from a higher to a lower bh rale efficiently. In aodiiion. all are 
computationally expensive. 

The present invention seeks to overcome ihe atorenoted and other problems and to incorporate trie desirec cnar- 
actenstics noted above. It is particularly Directed to tne art ol vcec data compression, and will thus be described with 
specific relerence thereto. It is appreciated, however, thai tne invention will have utility in other fields and applications. 

The present invention provoes 2 method lor compressing and transmitting oata. the method comprising steps ol. 
receiving the data; successively performing multiple stages ol first lookup operations to obtain compressed cata at each 
stage representing vector quantized discrete subband coefficients; and. transmitting the compressed data to a receiver. 

The method may comprise successively performing i levels ot a lirst table lookup operation to obtain, for example, 
2'.l ccmpressea data representing a subband transtorm, e.g.. DWT. ot the input data followed by vector quantization 
thereof. 

In accordance with another aspect of the present invention, ihe compressed data (which may also be transcoded 
at a gateway) is transmitted to a receiver. 

In accordance with another aspect ol the present invention, the compressed data is received at a receiver and 
multiple stages ol a second table lookup operation are performed to selectively obiain decompressed data representing 
al least a partial inverse subband transform of the compressed data. 

The invention further provides an apparatus tor carrying out the methods as set torthe above or in accordance with 
any of the embodiments described herein 

One advantage of the present invention is that encoding and decoding are accomplished solely by table lookups. 
This results in very efficient implementations. For example, this algorithm enables 30 tramesJsec encoding (or decoding) 
of CIF (320x240) resolution video on Sparc 2 class machines with just 50% CPU loading. 

Another advantage of the present invention is that, since only table lookups are utilized, the hardware implemented 
ic perform the method is relatively simple. An address generator and a limited number of memory chips accomplish the 
method. The address generator could be a micros equencer. a gate array, an FPGA or a simple ASIC. 

The present invenlton exists in ihe construction, arrangement, and combination, of the various parts ol the device, 
whereby the objects contemplaleo are attained as hereinafter more lully set lorth, specifically pointed out in Ihe claims, 
and illustrated by way ol exemplary embooiments in the accompanying drawings in which: 

FIGURE 1a illustrates a table structure ot prior an HVG: 

FIGURE lb is a signal flow diagram illustrating prior art HVO for speech coding; 

FIGURES 2a-c are a graphical representation of a prior art DWT; 

FIGURE 3 is a flowchart representing the preferred method of the present invention; 

FIGURES 4a-b are graphical representations ol a single slage of an encoder performing a WWHVQ in the method 
ol FIGURE 3; 

FIGURE 4c is a signal (low diagram illustrating an encoder performing a WWHVQ in the method ol FIGURE 3; 
FIGURE 5 is a signal flow diagram of a docodcr used in the method ol FIGURE 3; 

FIGURE 6 is a block diagram of a system using 3-D subband coding in connection with the method of FIGURE 3; 
FIGURE 7 is a block diagram ol a system using Irame differencing in connection with the method of FIGURE 3; 
FIGURE 5 is a schematic representation ol the hardware implementation ol the encoder of the method ol FIGURE 3: 
FIGURE 5 is a schematic representation ol another hardware implementation ol the encoder ol the method FIGURE 

3; 

FIGURE 10 is a schematic representation ol the hardware implementation ol the decoder ot the method ol FIGURE 
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3: ano 

FIGURE 11 is s schematic representation ol another nsrdwere implementation ct the Decoder ot the method ol 
FIGURE 3. 

c 

Referring now to the drawings wherein the showings are toi the purposes ot illustrating tne preierred embodiments 
ol the invention only and not tor purposes ol limiting same, FIGURE 3 crevices a flowchart ol the overall preferred 
embodiment. It is recognized tnat Ihe method is suitably implemented in tne structure disclosed in the loliowing preierred 
embodiment and operates in conjunction wiin software basec control procedures. However, it is contemplated that the 
io control procedure be embodied in other suitable mediums. 

As shown, the WWHVQ procedure rs initiated by obtaining, or receiving in the system implementing the method, 
input data representing an NxN pixel image wnere 6 bits represent a pixel (steps 10 and 12). A look-up operation is 
performed to obtain data representing a subband translorm followed by a vector quantization o! the data (slep 14). In 
the preferred embodiment, a discrete wavelet translorm comprises tne subband iranstorm. However, rt is recognized 

>$ that ether subband transforms will suffice. Upon completion of the look-up, a data compression has been performed 
Preferably, such compression is 2:1. Funher stages will result in further compression of the data, e.g., 4:1, 6:1, 16:1. 
32:1, 64:1.... etc. It is appreciated that other compression ratios are possible and may be desired in cenain applications. 
Successive compression stages, or iterations ol step 14, compose the hierarchy of Ihe WWHVQ. Accordingly, a deter- 
mination is made whether the compression is complete (step 16). II the compression is incomplete, lurther look-up is 

20 perlormed. II the desired compression is achieved, however. Ihe compressed data is transmitted using any known trans- 
mitter (step 18). II is then determined al, for example, a network gateway, whether further compression is required (step 
1 9). It so. transcoding is performed in an identical manner as the encoding (step 20). In any event, the receiver eventually 
receives the compressed data using known receiving techniques and hardware (step 22). Subsequently, a second 
look-up operation is perlormed to obtain data representing an inverse subband transform, preferably an inverse DWT. 

?5 ol decompressed data (step 24). After one complete stage, the data is decompressed. Further stages allow for further 
decompression of the data to a desired level. A determination is then made whether decompression is complete (step 
26). If the decompression is incomplete, further look-ups are perlormed. II, however, the decompression is complete, 
the WWHVQ procedure is ended (step 28). 

The embodiment of FIGURE 3 uses a hierarchical coding approach, advantageously incorporating particular lea- 

30 tures of hierarchical vector quantization (HVO) and Ihe discrete wavelet translorm (DWT). HVO is extremely last, though 
its performance directly on images is mediocre. On tne olher hand, the DWT is computationally demanding, though it 
is known to grealty reduce blocking artilacls in coded images. The DWT coellicients can also be weighted to match the 
human visual sensitivity in dillerenl frequency bands. This results in even belter performance, since giving higher weights 
tc Ihe more visually important bands ensures tnat they will be quantized to a higher precision. The present invention 

35 combines HVO and the DWT in a novel manner, lo obtain ihe bcsl qualities of each in a single system. Weighted Wavelet 
HVO (WWHVQ). 

The basic idea behind WWHVQ is to periormthe DWT filtering using table lookups. Assume the input symbols have 
already been finely quantized to p bits ot precision. For monochrome image data, p = 6 is typical. (For color image data, 
each color plane can be treated separately, or they can be vector quantized together into p bits). In principle it is possible 

*■£> tc finer [he data with an L-iap filter with one lookup per output symbol using a table with a Lp bit address space. Indeed, 
in principle it is possible to pertorm both the low-pass liliering and ihe high-pass liltering simultaneously, by storing both 
lowpess and high-pass results in Ihe table. Ol course, such a lable is dearly inleasible if L and p are even moderaiely 
large. We take an approach similar lo lhai ol HVO: lor a liiter of length l. iog ? L 'substages' ol table lookup can be used, 
it each table has 2p inpulbils andpoulput bits. Tne poutpul bits ol the linalsubslage represents a 2-dimensional vector, 
one symbol from tne low-pass band ano a corresponding symbol Irom the high-pass band. Thus, the wavelet coefficients 
output from the table are vector quantized. In this sense, the DWT is tightly integrated whh the HVO. The wavelet 
coefficients at each stage of tne DWT arc vector quantized as are ihe intermediate results (after each substage) in the 
computation ol the wavelet coefficients by lable lookup. 

FIGURE 4a shows one stage i ol the WWHVQ, organized as log ? L substages of table lookup. Here, L = 4, so that 

50 the number ol substages is two Note that the filters slide over by two inputs to compute each output. This corresponds 
tc oewn samp ling (decimation) by a factor of 2. and hence a reduction in bit rate by factor ol 2. The second stage ol the 
WWHVQ operates on coded outputs Irom the first stage, again using log ? l substages of table lookup (but with different 
tables than in ihe first siage) and so on for the following stages. It is recognized that each of a desired number ol stages 
operates in subslanlially Ihe same way so (hat tne p bits al Ihe oulpul ol stage i represent a 2'; 1 compression. The p 

55 oils ai me oulpul ol the linal stage can be transmiileo directly (or indirectly via a variable-length code). The transmiiied 
data can be lurther compressed or "transcoded*. loi example, at a gateway Del ween high and low capacity neiwoiks. 
simply by further stages ol table lookup, each of which reduces ihe brl rate by a lactor of 2. Hence both encoding and 
iranscoding are extremely simple. 
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The taoies in the first stage (i = 1} may be cesigned as loiiows, with leterence to FIGURE 4a. in this discussion. we 

snail assume ihai the input signal X = (x(C).x{i) x<N-1)} is one-cimen$ionei. Those skilled in tne an will have little 

difficulty gene raiding the discussion io image data. Firsi decompose the input signal X L0 = X into tow-pass and high-pass 
signals Xt, and Xh,. each of length N/2. This produces a sequence oi 2-dimensional vectors 

*0} = |x L1 (i).x H if')- ,Of i = °. 1 

where 

x L ,(i)= l|0)x(2i)4 I(1)x(2i4 1)4 l(2)x(2i 4 2) -» l(3)x(2i ♦ 3) 

and 

»mi= h t0)x(2i) 4 n(1)x{2i ♦ 1) 4 h(2)x(2i ♦ 2) * n(3)x(2i 4 3) 

Such 2-cimensional veciors are used lotrain an E-bitvecior quantizer 0, c 1or Table 1.2. Likewise. 2-dimensional vectors 

|l(0)x<2i) 4 l(i)x(2i * 1), h(0>(2i) 4 h(i)x(2i 4 l)], 
are used to train anS-bit vector quantizer C,„ tor Table i.ia. and 2-dimensional veciors 

[l(2)x(2i 4 2) 4 l(3)x|2i 4 3). h(2)x(2i * 2) 4 h(3)x(2i 4 3)1, 
are used to train an 6-bit vector quantizer Q, u for Table 1. 1b. All three quantizers ere trained to minimize the expected 
weighted squared error distortion measure o|x.y) =• |w L ,(x 0 - y 0 )p 4 |w M ,(x, - y,)p. where the constants w L1 and w H1 
are proponional to the human perceptual sensitivity (i.e.. inversely proportional to the just noticeable contrast) in the 
low-pass and high-pass bands, respectively. Then table I.ia is filled so that h assigns to each ot its 2 16 possible 2 -di- 
mensional input vectors (x 0 .x,) the 8-bit index c< the codeword [yLi.i« tV Mi.i J °' Oi.i, nearest to (NO)^ 4 l(1)x 1 .h(0)x, 
4 h(l)x-,) in the weighted squared error sense. Table Lib is filled so that il assigns to each of its 2 1€ possible 2-dimen- 
sional input veciors [x^xj the 8-bit index ot the cooeword [y L1 , b , y m .i b ] <* °i.ib nearest to [l(2)x 2 4 l(3)x 3 . h(2)x 2 4 h 
(3)x 3 ) in the weighted squared error sense. And finally TaWe 1.2 is filled so that it assigns to each of hs 2 16 possible 
4-dimensional inpui vectors (Le. the cross product of all possible 2-dimensional output veciors from the first stage), lor 
example. |y L1 „. y H1 . la , y L11b . y M , lb L the 8bi1 index ot the codeword fy u . y m )of 0 1>2 nearest to |y L11a + y t1 lb , y m i , 
4 y H1 1b ] in the weighted squared error sense. 

For a small cost in performance, it is possible to design the tables so that Table I.ia and Table 1. lb are the same, 
lor instance. Table 1.1, il i = 1 in FIGURE 4b. In this case, Table 1.1 is simply a table lookup version ol a 2-dimensional 
VO thai best represenls pairs ol inputs (x e . x,] in the ordinary (unweighted) squared error sense. Then Table 1 .2 is lilled 
so that it assigns to each of its 2 16 possible 4-dtmensional input vectors (i.e.. the cross product of alt possible 2-dimen- 
sional outpin veciors trom the first stage), for example. [y 0 , y, . y ? . yj. the 9- bit index ol the codeword [y L1 . y M1 ] of Q, 2 
nearest to [l(0)y o 4 l{l)y, 4 l(2)y 2 4 l(3)y 3 .h(0)y 0 = h{l)y, 4 h(2)y 2 4 h(3)y 3 ) in the weighted squared enor sense. Making 
Tables 1.1a and 1 .lb the same would result in a savings ot both table memory and computation, as shown in FIGURE 
4b. The corresponding signal flow diagram is shown in FIGURE 4c. 

Referring again to FIGURE 4a. the tables in the second stage (i - 2) are just slightly more complicated. Decompose 
the input signal X^,, ot length N/2, into low-pass and high-pass signals X,^ and X M2 , each ot lenglh N/4. This produces 

a seouenceoU-dimenskxtal vectors. x(i) = |x LJ (i),x hC (i).x M1 (2i).x M1 (2i 4 1)). lor i = 0.1 N/4 - 1. where x^i) = IfO^, 

(2i) 4 l(1)x l1 (2i 4 1) * 1(2)^,(21 4 2) 4 1(3)^,(21 4 3) and x M? (i) ^ h(0)x u (2i) 4 h(l)x l1 (2i + 1) 4 h(2)x L1 (2i + 2) 4 h(3)x L , 
(2i 4 3). Such 4-dimensional veciors are used lo train an E-brl vecior quantizer s lor Table 2.2. Likewise. 4-dimensional 
vectors (l(0)x Lt (2i) 4 l(i)x L1 (2i 4 l).h(0)x Lt (2i) 4 h(l)x L1 (2i 4 1),x H ,(2i).x H1 (2i 4 I)], are used lo Warn an 8-bil vector 
Quantizer 0 2 lor Table 2.18. and 2-dimensional vectors (1(2)^,(2* 4 2) 4 l(3)x LI {2i 4 3), h(2)x u (2i 4 2) 4 h(3)x t ,(2i 4 
3). x M1 (2i 4 2). x H1 (2i 4 3)). are used to iratn an E-bn vector quantizer C, u lor Table 2. 1 b. All three quantizers are trained 
10 minimize the expected weighted squared error disioftion measure d(x.y) = |w L2 {x 0 -y 0 )p 4 [w^f,*,- y,))2 ♦ |w M1 (x- • 
y?)) 5 ♦ ( w hi(*3 ' ya)l 2 ' w^ere the constants w u . w w and w H , are proponional to the human perceptual sensitivities m 
their respective bands. Then Table 2.ia is filled so that il assigns to each ot its 2 16 possible 4-dimensionat inpui vectors 
\Yo> yi. y?> lhe B*' 1 mde * o1 ,ne cooeword ot 0*„ nearest to |l(0)y 0 « KiJyT.hfOtyo 4 h(i)y 1 ,y 2 .y 3 ] in the weighted 
squared error sense, and so on tor Tables 2. lb and 2.2 in the second stage, and the tables in any succeeding stages. 

As in HVO. in WWHVO the vector dimension doubles with each succeeding stage. The formats of the vectors at 
each stage are shown as outputs of the encoder 30. graphically represented in FIGURE 4c. These lonmats are for the 
case ot the twodimensional separable OWT shown in FIGURE 2b. 

Referring now lo the case where atl tables in a given substage are identical as in FIGURE 4b. and 4c, il lhe number 
ol inputs lo a siage is S bytes, Ihen the number ol oulpuls is: 

S/2 

and the number ol table lookups per output is log L 11 the compression rat© is C, then lhe total number o1 outputs 
(including the ouipuis of intermediate stages) is* 



N 2 (C-iyC 
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for an image ot size N 2 . Thus, the total number ol table lookups is: 

(N ; (C-l)logLyC 

If the amount of storage neebecf (or the HVO encoder is T. then the amount ol sto;age neeaed for the WWHVO encoder 
b is T tog L per siege. 

Also shown in FIGURE 4c are the respective belays 2 which successively increase with each stage. Trie oval symbol 
inducing the I 2 Designation incicates that only one o! every iwo ouipuis is selectee, as those skilled in the art will 
appreciate. 

The WWHVO decoder 50 which performs steps 22-26 ol FIGURE 3 is shown in FIGURE 5. All tables ol Decoder 
ip 50 in a given substage are identical, similar tc the encooer ol FIGURES -4b and 4c. As tnose skilled in the an will appre- 
ciate, the oddeven split tables 52. 54 at each stage handle the interpolation by 2 that is part of the DWT reconstruction. 
It L = 4 and the filter coefficients are h(i) and l(i).i = 0, 1.2.3, then the odd table 52 computes h(1)x(i) * h(3)x(i ♦ 1) and 
l(l)x(i) ■» [i* 1). while the even table 54 computes h(0)x(i) ♦ h(2)x(i ♦ Hand UO)x(i) * l(2)x(i * 1). where x(i)*s are the 
inputs to thai stage. If the number of inputs to a stage is S. then the number of outputs from that stage is 2S. The totaf 
»; number of table iookups for the stage is: 

S log (172) 

If tne compression ratio is C, then the total number of outputs (including outputs ol intermediate stages) is: 

2N 2 (C-1VC 

20 tor an image of size N 2 . Thus, the total number ot table lookups is: 

(N 2 (C-l)/C)toQ(Ly2) 

It the amount ot storage needed lor the HVO encooer is T, then the amount ol storage needed for the WWHVO decoder 
per stage is: 

« T<iog(V2)+i)=Tlogl 

All the storage requirements presented are for 6-bits per pixel input. For color images (YUV. 4:2:2 format) the storage 
requirements double. Similarly the number ol table lookups also doubles for color images. 

As shown in FIGURES 6 and 7, there are two options lor handling motion and intertrame coding using the present 
method, tna lit si mode (FIGURE 6). the subbanb cooing is exlenoeo and lollowed by a vector quantization scheme that 
30 allows lor intratrame coding performed as described in connection with encoder 30 to inlertrame coding, designated by 
reteience numeral 80. This is similar to 3-D subband coding. 

The second way (FIGURE 7) ol handling motion is to use a simple irame differencing scheme that operates on the 
compressed bate and uses an oroered codebook to decide wnethcr to transmit a certain address. Specifically, the 
WWHVO encoder 30 shown in FIGURE 4c is used in conjunction with a Irame differencer 70. The Irame differencer 70 
uses a current Irame encoded by encoder 30 and a previous Irame 72 as inputs to obtain a result. 
Some ol the features of the WWHVO of the present invention are: 

Transcoding 

*o The sender transmits a video stream compressed al 16:1 The receiver requests the 'gateway' lor a 32:1 stream 

tor Ihe gateway sees that Ihe slower link can only handle 32: t). All the gateway has to do to achieve this transcoding 
is do a lurther level of table lookup using the data it receives (al 16.1) as «npul. This is extremely useful, especially when 
a targe range of bandwidths have to be supported, as. tot example, hi a heterogenous networked environment. 

-5 Dimension Scaling 

If the vioeo stream is compressed up to J stages, then the receiver has a choice ol (J) +• i image sizes without any 
extra eflon. For example, it a 256 x 256 image is compressed 10 5 stages (i.e.. 64:1), then the receiver can reconstruct 
a 32 x 32 or 64 x 64 or 12E x 12E or 256 x 256 image without any overhead. This is done by just using the low pass 
3> bands LL available at the even numbered stages (see FIGURE 3). Also, since the whole method is based upon table 
look-ups, h is very easy to scale the image up. i.e.. the interpolation and lowpass tittering can be done ahead of time. 
Note also thai all this is done on Ihe compressed bil-slream hsetl. In stanoard vioeo compression schemes both down 
anc up scaling is achieved by explicitly littering and oecimattng or interpolating the input or output image. 

ss Motion 

There are two simple options for handling motion Simple Irame differencing can be applied to the compressed data 
ilsctl Another option is to use a 3-D subband coding scheme in wnich the temporal dimension is handled using another 
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WWHVQ in conjunction witnine ssatial WWHVO. The wavelei used here can be diftereni (h is drflerenl in practice). In 
e preferred implemer.iaiion, mojion detection anc thresnotoinc are sceomplisned. This is oone on the compressed cata 
stream. The current compressed frame and the previous compressed frame are comcared using a table lookup. This 
generates e binary mac which indicates places wnere motion nas occur red. This mac is used to decide which blocks 
icooeworcs) tc seno and a ?erc is inserted n place ot stationary bkxks. This wnole process is oone in an online manner 
with a run-length encoder. Tne table that is used to create the binary map is constructed of! -line and the motion threshold 
is set at that time. 

Dithering 

Typically, ihe oecoder has to do color space conversion and possibly dithering (tor< 2*-bit displays) on the Decoded 
stream betore it can display it. This is quite an expensive task compared to the complexity ot the WWHVQ decoder. But 
since tne WWHVO decooer is also baseo upon table lookups, these steps are incorporated into the decoder output 
tables. Other than speeding up the decooer, tne other major advantage ot this technique is that it makes the oecoder 
completely independent ot the display (resolution, depth etc.). Thus, the same compressed stream can be decoded by 
the same decooer on two drflerenl displays by just using different output tables. 

The WWHVO method is very simple and inexpensive to implement in heroware. Basically, since the method pre- 
oominantly uses only table lookups, only memory and address generation logic is required. The architectures described 
below ditfer only in the amounl ol memory they use versus the amount ol logic. Since slternaie stages operate on row 
ano column data, mere is e need lor some amount ol buffering between stages. This is handled explicitly in Ihe archi- 
lectures described below. All the slorage requirements are given lor e-bits per pixel input. For color images (YUV. 4:2:2 
lormat) the storage requirements double. But the liming information remains the same, since the luminance and chromi- 
nance paths can be handled in parallel in hardware. Also, as noted above, two simple options are available tor interirame 
coding in WWHVO The 3-D subband coding option is implemented as an additional WWHVO module, while the trame 
differencing option is implemented as a simple comparator. 

Referring now to FIGURE E, each one of the tables i 1 (or i.la and i. 1 b) and i,2 ot the present invention are mapped 
onto a memory chip (64KB in this case). The row-coJumn alternation between stages is handled by using a buffer 60 of 
NL bytes between stages, where N is the row dimension of ihe image and L is the length ol the wavelei filter. For example, 
between stages 1 and 2 this buffer is written in row format (by stage 1) and read in column lormat by stage 2. Some 
simple aodress generation logic is required. Accoroingly, address generator 90 is provided. The address generator 90 
is any suitable device such as an incrementer. an accumulator, or an adderplus some combinational glue logic. However. 
Ihis architecture is almost purely memory. The input image is led as Ihe adoress to the tirst memory chip whose output 
is lea as Ihe aodress lo the next memory chip and so on. 

The total memory requirements lor the encoder 30 and the decoder 50 are T log L ♦ NL(M - 1) bytes, where M is 
the number of stages. For example, the WWHVO encoder and decoder shown in FIGURES 3 and 5 need 64KB tor each 
table i.1, i.2 plus the buffer memory 60. If Ihe number of stages, M, is 6, the wavelei filter size, I, is 4 and the image row 
dimension, N. is 256, then the amounl of memory needed is 768KB < 5KB; 773 KB. The number of 64KB chips needed 
is 12 and the number of 1 KB chips needed is 5. The throughput is obviously maximal, i.e.. one output every block cycle. 
The latency per stage is NL clocks, except lor the lirsl stage which has a latency of i clock cycle. Thus the latency after 
the m stages is (m - i)NL + 1 clocks. 

The main advantage ol this archiieclure is that it requires almost no glue logic and is a simple flow through archi- 
tecture. The disadvantages are thai the number ol chips required scales with the number ol stages and board area 
recuired becomes quite large. The capacity ol each chip m this architeciure is quiie small ano one can easily buy a 

cneap memory chip which has the capacity ol all these chips combined. This is consoered in the architecture ol FIGURE 
c 

Referring now to FIGURE 9, if all the tables are loaded onto one memory chip 100. then an address generator 102 
ano a frame buffer 104 are used. In this architeciure one stage of the WWHVO is computed at a time. In fact, each 
sub-level of a stage is computed one at a time So in the encoder 30 the table lookups tor table i.i are done tirst. the 

N 2 /2 

values thet result are stored in the (half) frame butler 104. These values are then used as input for computing the table 
lookuos ot table i.2 and so on. Clearly, me most Irame storage neeoed is: 

N 2 /2 

byies The Irame buffer 104 also permits simple row-column allernalwn between stages. 

The address generator 102 has to generate two addresses, one for the table memory chip 106 and another lor the 
tramc memory chip 104. The address tor the Irame memory chip 104 is generated by a simple incrementer. since the 
access rs uniform. It the number of taps in the wavelet lifter is L, then the number of levels per stage (i.e., wavelet tables) 
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lor the encoder 30 is Jog L (it is log L/2 to the oecoaer) Each ot the tables ■ l and t.2 are of size tsize. The ouiput of 
level c ot stage m is computed, it the output of the trame memory lO* is y(i). men ihe address mat the address generator 
ICS computes tcx the table memory 106 is y(i) * ottset. where offset = |(m - i)tog L •» (q-i))'isize. This is assuming the 
tables i.l, i.2 are stored in an ordered manner. The muliiplicairons invctved m tne computation c' offset need not be 
s oone since otlset can be maintained as a running total. Eecn time one level c: computation is completed cflset = ofiset 
4 isize. Tnus. the address generator 102 is any suiiable cevice such as an incremented an accumulator, or en adder 
plus some combinations! glue logic. 

The total memory requirements lor this architecture is: 

l0 T too L ♦ N z /2 

bytes spread out over l wo chips, the trame memory 104 and tne table memory 106. For the example considered in the 
previous section this iransleies to 600KB ot memory (758KB « 32KB). The throughpul is one output every clock cycle. 
The latency per stage is: 

wnere m is the stage number. The first stage nas a laiency of just log L. Thus, the latency aher m stage is: 

N 2 (i-(1/2 m " 1 )) tog L + 1 

Tne advantages ot this architecture are its scalability, simplicity and low chip count. By using a large memory chip 
?o 100 (approximately 2MB or more), various configurations of the encoder and the decoder, i.e., various tabie sizes and 
precision, may be considered and. the number ot stages can be scaled up or down without having to change anything 
on the board. The only disadvantage is laiency. but in practice the latency is well below 50 milliseconds, which is the 
Ihresnold above which humans clan noticing delays. 

II is important to note the connection between the requirement of a half frame memory 104 and the latency. The 
« laiency is there primarily due to the fact that all the outputs of a stage are computed before the next stage computation 
begins. These intermediate outputs are stored in the frame memory 104. The reason the latency was low in the previous 
architecture was that the computation proceeded in a How through manner, i.e., begin computing tne outputs ot stage 
m before all ihe outputs of stage m - 1 were computed. 

FIGURE 10 illustrates Ihe architecture lor Ihe oecoder similar to the encoder ol FIGURE 8. As shown, an address 
30 generator 90' is connected to a butler 60' having inputs ol ine odd and even tables 52 and 54. Tne oulput of Ihe buffer 
connects to the next stage and the address generator also connects to the next butler. 

FIGURE n shows the architecture tor tne oecoder similar to the encoder ol FIGURE 9. As illustrated, the intertrame 
coder SO is simply placed at the inpul to the cnip 100'. as opposed to the output. 

The memory chips utilized to lacilrtaie the took-up tables ot me present invention are preferably read only memories 
55 (ROMs) However, it is recognized that other suitable memory means sucn as PROM, EPROM. EE PROM. RAM, etc. 
may be uiilized. 



Claims 



1. A method lor compressing and transmitting data, the method comprising steps of: 

receiving the data; 

successively pertorming multiple stages ol first lookup operations to ootarn compressed data at each stage 
representing vector cuanli2ed discrete subband coefficients; and, 
<5 iransmining the compressed oata to a receiver. 

2. The metnod according to claim 1 lurther comprising: 

receiving the compressed data ai me receiver; 

successively pertorming multiple stages of second lookup operaiions ic selectively obtain ai each stage before 
so a last stage decompressed data representing a partial inverse subband iranslorm ol the compressed data. 

3. The method ol claim 1 or 2, wherein i lookup stages are perlormec and the compressed data is 2 s : 1 compressed 
oala. wnere i is an integer. 

55 4. The method according to claim i, 2 or 3, wherein the subband tr&nsiorm cocificients comprise discrete wavelet 
transform coefficients. 

S. The method of any ol claims 1 to 4. turther comprising transcoding ;hc compressed data alter the transmitting at a 
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caieway ic trie receiver to obtain further compressed date. 

6. A metnoc adaptable lor use on signal oaia compressed by oeflcmr,inc a subband transform loltowed by a vector 
cuar.ti2aiion of the signal data, comprising steos ot: 

receiving the compressed signal data; 

successively performing multiple stages ol lookup operations to selectively ociain at each stage betore a last 
stage panially cecompressec data representing a partial subbanc transform ol ine compressed signal data. 

7. The apparatus lor compiessing and transmitting data, comprising: 

means tor receding the data; 

means tor successively performing multiple stages ol first lookup operations to obtain compressed data at 
each stage representing vector quantized discrete subband coefficients: and. means transmitting the conpressed 
cata to a receiver. 

6. The apparatus according to claim 7, further comprising: 

means for receiving the compressed cata at the receiver; and 

means for successively performing multiple stages of second lookup operations to selectively obtain at each 
stage before a last stage Decompressed cata representing a partial inverse subbanc transform of the compressed 
oata. 

9. An apparatus adaptable lor use on signal data compressed by perlorming a subbano translorm followed by a vector 
□uantizaiion of the signal data, comprising: 

means for receiving the compressed signal data; 

means for successively performing multiple stages of lookup operations to selectively obtain at each stage 
before a tast stage panially decompressed data representing a partial subband transform ol the compressed signal 
data. 

10. A programmable apparatus for compressing and receiving data, when suitably pprogrammed for carrying out the 
meinod ol any of claims 1 to 6. 
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